Biostatistics For Dummies (Monika Wahi John Pezzullo)

Figure 23-2a shows a typical survival curve. It’s not defined by any algebraic formula. It just

graphs the table of values obtained by a life-table or Kaplan-Meier calculation.

Figure 23-2b shows how the baseline survival curve is flexed by raising every baseline survival

value to a power. You get the lower curve by setting h = 2 and squaring every baseline survival

value. You get the upper curve by setting h = 0.05 and taking the square root of every baseline

survival value. Notice that the two flexed curves keep all the distinctive zigs and zags of the

baseline curve, in that every step occurs at the same time value as it occurs in the baseline curve.

The lower curve represents a group of participants who had a worse survival outcome than

those making up the baseline group. This means that at any instant in time, they were

somewhat more likely to die than a baseline participant at that same moment. Another way

of saying this is that the participants in the lower curve have a higher hazard rate than the

baseline participants.

The upper curve represents participants who had better survival than a baseline person at

any given moment — meaning they had a lower hazard rate.

Obviously, there is a mathematical relationship between the chance of dying at any instant in time,

which is called hazard, and the chance of surviving up to some point in time, which we call survival.

It turns out that raising the survival curve to the h power is exactly equivalent to multiplying the hazard

curve by the natural logarithm of h. Because every point in the hazard curve is being multiplied by the

same amount — by Log(h) — raising a survival curve to a power is referred to as a proportional

hazards transformation.

But what should the value of h be? The h value varies from one individual to another. Keep in

mind that the baseline curve describes the survival of a perfectly average participant, but no

individual is completely average. You can think of every participant in the data as having her very

own personalized survival curve, based on her very own h value, that provides the best estimate

of that participant’s chance of survival over time.

Seeing how predictor variables influence h

The final piece of the PH regression puzzle is to figure out how the predictor variables influence h,

which influences survival. As you likely know, all regression procedures estimate the values of the

coefficients that make the predicted values agree as much as possible with the observed values. For

PH regression, the software estimates the coefficients of the predictor variables that make the

predicted survival curves agree as much as possible with the observed survival times of each

participant.

How does PH regression determine these regression coefficients? The short answer is,

“You’ll be sorry you asked!” The longer answer is that, like all other kinds of regression, PH

regression is based on maximum likelihood estimation. The software uses the data to build a long,

complicated expression for the probability of one particular individual in the data dying at any